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1. INTRODUCTION 

Computer vision is an imaging technique using a computer and a camera that has had many 
applications in agriculture. It has flourished in the last two decades for evaluating the quality of fruits and 
vegetables. Ripeness, external or internal damages, and chemical contents are among the quality attributes 
often used for classifications and predictions of fruits and vegetables [1]. Machine vision uses computer 
vision with other instruments to perform automatic tasks, especially for non-destructive and fast sorting and 
grading of fruits and vegetables. Machine vision aims to substitute tedious, time-consuming sorting and 
grading process [2]. Computer vision obtains images of fruits and vegetables and performs many steps, such 
as preprocessing, segmentation, feature extraction, and classification. Information extracted from the 
resulting images can represent the qualities of fruits and vegetables based on external and internal 
characteristics [3]. 

Computer vision methods have evolved rapidly due to technological advances in computers, image 
detectors, and image processing methods. Conventional computer vision uses a color camera and white light. 
It is applied to assess the external characteristics of fruits, such as color, shapes, sizes, or textures. Spectral 
imaging is a computer vision technique that combines imaging methods and spectroscopy. It has not only 
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spatial but also spectral information. Spectral imaging includes hyperspectral and multispectral imaging. 
Hyperspectral imaging has more advantages over traditional imaging due to continuous wavelength region 
coverage, providing higher image resolution [4]. It can predict maturity and ripeness based on the internal 
characteristics of fruits and vegetables [5]. Hyperspectral imaging has been used widely in food industries for 
evaluating the quality and safety of food products [6]. It has been able to predict apple internal quality [7], 
moisture content (MC), dry matter content (DMC), firmness (F) of dates [8], and dry matter content of 
avocado [9]. 

Many types of computer vision require machine learning to classify fruits and predict fruit qualities, 
including hyperspectral imaging. Machine learning is part of artificial intelligence used intensively in 
computer vision. It contains algorithms to analyze, learn, and make decisions from datasets. Deep learning is 
the advancement of machine learning which is more robust because it has complex algorithms. It is in 
demand since many multifaceted problems are found in fruit and vegetable classification [10]. Artificial 
neural networks (ANN) and convolutional neural networks (CNN) are two types of deep learning methods 
used in many applications in fruit classification. This machine learning type has been applied with 
hyperspectral imaging to classify or predict fruit physical-chemical characteristics, such as predicting 
firmness and soluble solid contents of Korla pear [11] and tomatos [12]. 

Classification of fruit and vegetables using hyperspectral imaging and ANN model need reliable input 
datasets. Some image preprocessing steps are essential for hyperspectral datasets due to random noise from 
many sources, such as misalignments of optical components, sensor sensitivity, and inappropriate reflectance 
calibrations [13]. One of the preprocessing techniques for hyperspectral datasets is Savitzky-Golay (SG) 
filtering. SG filtering performs curve fitting successive subsets of an adjacent dataset using the least-squares 
digital polynomial smoothing filter [14]. This method has been used for hyperspectral images of peanut seed 
vigor [15] and strawberry water content estimation and ripeness classification [16]. The next step after SG 
filtering is to validate the training datasets using k-fold cross-validation. K-fold cross-validation is one of the 
methods used to validate an estimation model and find reliable variables [17]. It is a popular procedure for 
evaluating the performance of classification algorithms [18]. This technique is part of preprocessing methods 
used in machine learning for wide-ranging problems, such as predicting the notch frequency of an 
ultra-wideband (UWB) antenna [19] and quality attributes of orange fruit using hyperspectral imaging [20]. 

Crude palm oil (CPO) is one of the export commodities which contribute to the economic growth of 
countries in Southeast Asia, such as Indonesia and Malaysia. However, these industries have faced crucial 
problems such as CPO quality, process automation, and environmental issues that challenge the industry 
sustainability. Small holder plantations have less access to certification bodies [21]. Oil palm fresh fruit 
bunches (FFBs) are the source of CPO. The main CPO quality attributes are oil contents and free fatty acids, 
which relate to the ripeness of oil palm FFBs. High oil contents and low free fatty acids are the desirable 
qualities of oil palm FFBs arriving in a palm oil refinery. Sorting and grading FBBs are very crucial 
processes in obtaining high-quality FFBs. However, in practice, they are still done manually and 
destructively. Electronic sensors and imaging techniques were capable to predict the ripeness levels of oil 
palm FFBs and improve the sorting and grading processes. A detection system has used a 670 nm light 
source and an optical sensor to determine the FFB ripeness levels [22]. Moreover, imaging techniques such 
as thermal imaging [23], laser-induced fluorescence imaging [24], and near-infrared (NIR) spectroscopy [25] 
have been proposed to determine and predict FFB ripeness. 

In this study, we developed a hyperspectral imaging-based machine vision that suits the environment 
in the reception area at a palm oil refinery facility. The system consisted of a conveyor unit, a hyperspectral 
imaging unit, a light-tight box, and an image processing software unit. Hyperspectral imaging has been used 
for oil palm FFB ripeness detection with K-mean clustering analysis [26] and ANN model [27]. Most of the 
innovations regarding the prediction of oil palm applied traditional computer vision and other instruments. 
Traditional computer vision uses a webcam or smartphone with color spaces such as red green blue (RGB) or 
hue saturated value (HSV). Bulge and commercial spectrometers were also used which are difficult to 
integrate for real-time machine vision. Some systems were on a laboratory scale. More study is necessary to 
implement efficiently hyperspectral imaging and ANN model in a real-time oil palm FFB sorting and grading 
machine vision. We proposed a feed-forward ANN model to predict FFB ripeness levels, categorized as 
unripe (immature) and ripe (mature). We used SG filtering and k-fold cross-validation techniques on the 
datasets of spectral reflectance intensities, resulting from the hyperspectral imaging system before being used 
in the ANN model. A confusion matrix measured the accuracy of the prediction. We used self-written 
Matlab-based software to do the image processing and analysis process. This paper contains an introduction, 
method, results, and discussion, followed by a conclusion. 
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2. METHOD 

Designing ANN models with k-fold cross-validation for predicting the ripeness levels of oil palm 
FFBs required some stages. The first stage was the acquisition of FFB images using a hyperspectral imaging 
system. The second was to resize and reduce the format of the hyperspectral images and calibrate each image 
using a white reference image and a dark image. The next stage was to impose the region of interest (ROI) 
and average to obtain spectral data represented by the average reflectance intensities versus wavelength for 
each FFB. Later, SG smoothing and K-fold cross-validation would validate the spectral datasets for the ANN 
model. The last step was to design and implement the ANN on the hyperspectral datasets. We used a 
confusion matrix to measure the prediction performance and a graphical user interface (GUI) to display the 
SG smoothing and ANN prediction. 


2.1. Hyperspectral images 

The hyperspectral images of oil palm FFB were acquired using a hyperspectral imaging system, as 
shown in Figure 1 [26]. The system consisted of a Sentech NIR Monochrome camera with a resolution and 
sensor size of 2.2 MP and 2/3”, a specim impector V10 spectrograph in 400-1000 nm (Vis-NIR) region, a pair 
of Dolan Jenner halogen line light sources, a belt conveyor, and a control unit. The camera has Senko 25 mm, 
2/3” lens. The hyperspectral imaging system used a line-scanning scheme controlled using a MATLAB-based 
acquisition program and contained in a light-tight or black box to minimize room light. The distance from the 
camera lens end to the conveyor surface was 83 cm. The line lights were positioned at each box side forming 
45° angle to the vertical line. 


Halogen Light Source 
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Figure 1. The optical and hardware components of the hyperspectral system 


This study used samples of oil palm FFBs from a vicinity plantation about 7 kilometers from our 
laboratory. The harvested Tenera FFB samples were categorized previously as fractions FO, F1, F2, F3, and 
F4 based on color changes and the number of loosed fruits [23], helped by experienced harvesters. The 
standards for ripeness fractions categorize FO and F1 as unripe, F2 and F3 as ripe, and F4 as the overripe. The 
FFB fractions were classified further for this study as unripe or immature (FO, F1) and ripe or mature (F2, F3, 
and F4) since F4 fraction FFBs are acceptable in the sorting area. Table 1 shows the color images of unripe 
and ripe oil palm FFBs captured from three sides, where the FFB back side image was uncounted due to 
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infertile fruits on half of the side. Unripe or immature FFBs have colors ranging from blackish to purple, 
while the ripe or mature FFBs have red to orange colors with some loosed fruit from bunches. Each FFB has 
a ripeness fraction symbol on its stalk for easy identification. 

Image acquisition was performed on each oil palm FFB moving on a conveyor. First, a calibration 
was made for the hyperspectral optical unit using three visible lasers with 405 nm, 532 nm, and 650 nm 
wavelengths. It aimed to calibrate pixels on the wavelength axis of a hyperspectral cube image where the 
Specim v10 spectrograph specim has a wavelength range of 400-1,000 nm. We have 23 image acquisition 
times for building the ANN model dataset. Each had 8 FFBs consisting of 4 ripeness fractions (184 FFBs). 
The FFBs were scanned immediately within 24 hours after harvesting to maintain the ripeness levels. The 
image acquisition used a line scan scheme with 250 frames per second speed. We also took white and dark 
reference images. The acquisition process saved the recorded images in matfile (.mat) format for image 
processing. 


Table 1. Images of three sides of an oil palm bunch 
Ripeness category Front side Right side Left side 


2.2. Image preprocessing and spectral mean conversion 

The image processing for oil palm FFB images aimed to obtain the reflectance intensities of light in 
the wavelength range of 400-1000 nm. The oil palm FFB images obtained using the hyperspectral system 
and MATLAB acquisition software have a size of 1088x2048 px in 4D format. The first step of the image 
processing was to resize the images for faster image processing time and convert into 3D format and store 
them in Matfile (mat) format. The 3D spectral images had a matrix format of (A, y, x) with 2 as wavelength 
and x, y as spatial coordinates. One of the essential steps in spectral image processing is to correct each 
spectral image using a standard white reference image and a dark image. The (1) shows the relation of the 


corrected intensity (/,) as the function of I,, Ig, and I, were the raw hyperspectral image intensity, the dark 
image intensity, and the white reference image intensity, respectively [26]. 


Unripe (immature) 


Ripe (mature) 
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The resized, reduced, and corrected spectral images had a dimension of (A, y, x). Then, the spectral 
images were converted to images of a matrix format of (x, y, A). The final hyperspectral images had a size of 
(10241088544), which has 544 wavelengths. Spectral mean conversion for each image was performed to 
get mean reflectance intensities for a wavelength range of 400—1,000 nm for each FFB sample [13]. The 
image processing stage obtained 523 useable spectral data points. 


2.3. Smoothing and k-fold cross-validation 

Smoothing is a preprocessing step before the hyperspectral images available for the ANN classifier. 
The hyperspectral images contain random noise due to optical misalignment, inhomogeneous lighting, and 
sensor noise. Here SG filter functioned to smooth the dataset. After conversation and resizing, the spectral data 
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become a matrix with a size of Nx544, which N is the number of spectral data. It means each spectral data has 
a spectrum with wavelengths of 544. Each spectrum plot consists of reflectance intensity on the y-axis 
and wavelength on the x-axis. In this study, we used the MATLAB program to do SG smoothing. SG filter 
was performed on a window portion of the hyperspectral spectrum using a polynomial function to fit by the 
least square method [14]. We used framelen of 11 and an order of 3 for the polynomials, with a dt of 1/551. 

The training and testing data need to be compatible with the ANN classifier. Therefore, data selection 
is crucial. The total number of data points obtained previously was 523. However, not all the data points are 
suitable for designing the ANN model. Data selection tries to find the average of the data points and choose 
the data point closest value to the average value. This technique aims to reduce error when applied to the ANN 
classifier. This process resulted in 72 data points, ready to be inputted into the ANN classifier. 

Cross-validation is one of the machine learning techniques used to evaluate and test a classifier model. 
The cross-validation method has many forms, which include k-fold cross validation. The k-fold cross-validation 
aims to select training and testing datasets to have ANN models with higher accuracy. Another objective of 
k-fold cross validation is to avoid a classifier mode having an overfitting condition. 

We divided evenly, randomly the 72 data points into seven parts or subsets [18] as shown in 
Table 2. Each subset is called the k-fold. There existed 7-fold (fold 1—fold 7) where folds 3 and 4 had 11 data 
points, and other folds had 10 data points, respectively. Iteration of fold 3 and fold 4 had 11 testing data 
points and 61 training data points hence training ratio was 15:85. For folds 1, 2, 5, 6, and 7, each had 10 
testing data points and 62 training data points, hence had a training ratio of 13:87. Each fold was tested for its 
accuracy and compared to the other folds. Fold with the highest accuracy will be used for the ANN model. 

Table 2 shows the partition of the 1“ to 7" fold with seven iterations. At the first iteration, 1* fold 
subset was used as testing data, while 2" to 7 fold subsets were the training data. Similarly, at the second 
iteration, the 2" fold subset was used as the testing data, 1“ fold, and the 3" fold-the 7" fold subsets were the 
training data. The iteration process would continue correspondingly. After the iteration process, the accuracy 
of each subset was calculated. This process intended to find the average accuracy level of the designed ANN 
model and which k-fold had the highest accuracy. 


Table 2. Separation of training and testing data 


Cross validation Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 
Iteration 1 Test Train Train Train Train Train Train 
Iteration 2 Train Test Train Train Train Train Train 
Iteration 3 Train Train Test Train Train Train Train 
Iteration 4 Train Train Train Test Train Train Train 
Iteration 5 Train Train Train Train Test Train Train 
Iteration 6 Train Train Train Train Train Test Train 
Iteration 7 Train Train Train Train Train Train Test 


2.4. Designing and implementation of ANN model 

After completing the k-fold cross-validation process, we designed an ANN. The ANN was a 
feedforward ANN with backpropagation functions. Figure 2 shows the architecture of the ANN. It has 
499-10-1 that indicates 499 inputs, 10 hidden layers, and one | output. Input data consisted of 72 data points, 
each having a value of 544. The ANN used training ratios of 13:87 and 15:85 as described previously in 
k-fold cross-validation process. 

The next step was to use the ANN model for the oil palm FFB ripeness prediction. Before predicting 
the ripeness levels of oil palm FFB, the ANN was trained using data subsets from the k-fold cross-validation 
until reaching the optimal results. The ANN parameters include epochs of 100 and learning rate of 0.1. The 
ANN system was written in MATLAB language of version 9.8.0.1323502 and equipped with GUI. GUI 
displayed the hyperspectral spectrum before and after smoothing using the SG filter. It can also show the 
estimation result for ripeness levels of oil palm FFBs. The ANN model was implemented on the 
hyperspectral images taken using the imaging system. The ANN algorithms were considered potential to be 
implemented if the prediction accuracy was more than 75%. 


2.5. Accuracy analysis using confusion matrix 

Using metrics is the final step to measure the accuracy of the ANN model for the hyperspectral 
images. Confusion matrix is one of the popular methods to measure ANN classifier performance. A confusion 
matrix aims to describe estimation model performance in a chart shown in Figure 3 [28]. The chart in Figure 3 
contains the accuracy and statistical parameters, such as error rate, recall, and precision. In this study, the 
confusion matrix was used to describe the performance of each fold validation. 
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Figure 2. Structure of the designed neural network 
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Figure 3. Confusion matrix of prediction and actual values 


The accuracy calculation uses positive and negative values, which are | for positive and 0 for 
negative. However, for this prediction, the result of the negative value means unripe FFB, while the positive 
value means ripe FFB. Equations for accuracy prediction, error rate, recall, and precision [29] are described 
in (2)-(5), respectively. Here, true positive (TP), true negative (TN), false positive (FP), and false negative 
(FN) are defined as TP=correct prediction result with positive value, TN=correct prediction result with 
negative value, FP=incorrect prediction result with positive value, and FN=incorrect prediction result with 
negative value. 


TP+TN 


— 0 
Accuracy = ————___ x 100% (2) 
Error Rate = — 0 — +x 100% (3) 
TP+TN+FP+FN 
TP 
Recall = —— x 100% (4) 
TP+TN 
ae TP 
Precision = —— x 100% (5) 
TP+FP 


3. RESULTS AND DISCUSSION 
3.1. Savitzky-Golay smoothing 

Figure 4 shows the original hyperspectral spectral of an oil palm fresh fruit bunch compared to the 
filtered spectrum using the SG algorithm. SG filter is one of the preprocessing techniques, which is also 
applied to hyperspectral images [30]. Hyperspectral spectra have noises due to the complexity of the 
hyperspectral imaging system with many electronic and optical components involved. Therefore, it needs 
spectral preprocessing and calibration of image data [13]. Figure 4 show that SG smoothing has reduced the 
spectral noises significantly. 


Neural network with k-fold cross validation for oil palm fruit ripeness prediction (Minarni Shiddiq) 


170 o ISSN: 1693-6930 


As shown in Figure 4, the hyperspectral spectrum has the relative reflectance intensity in the y-axis 
and the wavelength range in the x-axis. The x axis is associated with the wavelength ranges of the camera 
detector, lens, spectrograph, and halogen lamps of 400-1000 nm region. The visible-infrared spectrum has 
higher intensities in the region of 700—900 nm, the same fashion found in similar research of hyperspectral 
and infrared imaging on oil palm fresh fruit bunches [25], [27]. The measurement using four-band optical 
sensors showed higher reflectance intensities in the infrared region (700—900 nm) due to fewer absorbances 
by chlorophyll and anthocyanin in the mesocarp layer [22]. 


Reflectance Intensity 


0 O41 O2 O03 04 OS O06 OF O8 0a 1 
Wavelength 


Figure 4. Original hyperspectral spectrum and after SG filtered 


3.2. Displaying ANN results 

The graphical user interface-GUI has been created for this study, which could be used for any 
hyperspectral data of oil palm FFBs. It had two buttons, which functioned to input hyperspectral images and 
to process and predict the result using the ANN. Figure 5 shows the GUI front view. In Figure 5, the front 
view of the GUI has two boxes for a hyperspectral image, the resulted spectra (left) and SG-filtered spectra 
(right). 


Artificial Neural Network with Hyperspectral Image 
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Figure 5. The GUI front view with spectral data input and the prediction result 
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The operation of the GUI starts by using the two available buttons. Each hyperspectral image is 
inserted automatically by clicking the <input hyperspectral image> button. The left box of the GUI shows the 
hyperspectral spectra of the inserted image. The next step is to process the hyperspectral data point. By 
clicking the button <predict>, the hyperspectral spectra are filtered by the SG filter algorithm and displayed 
on the right. The bottom right section shows the prediction result of ANN for oil palm FFB. The results have 
two options, immature (unripe) or mature (ripe). Oil palm FFBs arrived at the reception area of an oil palm 
mill are often categorized using two ripeness levels, unripe and ripe. Then the immature FFBs are separated. 
Other FFB conditions are related to external damages, such as rotten, empty, and long stalk FFBs. These 
categories need other sorting methods, such as object detection. 


3.3. Testing data and prediction result 

Results of testing the hyperspectral data of oil palm FFB using the k-fold cross-validation method 
had the highest accuracy for the 2™ fold, 5" fold, and 7 fold with 90 % accuracy. The results show that the 
n-fold cross validation method can measure the ANN accuracy. One of the standard metrics used to measure 
the performance of an ANN classifier based on k-fold validation is a mean squared error (MSE). Figure 6 
shows the predicted and the actual result by the ANN model for the 7™ fold prediction, which gives an MSE 
of 8.9924e-23. K-fold cross-validation works by finding the lowest MSE as the ‘optimal’ model. The 
discrepancy between the target and the ANN model could be due to the averaging effect [17] and noisy 
hyperspectral images. 


ANN vs Target Output Graph with MSE value = 8.9924e-23 
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Figure 6. Comparison graph of prediction results and test data on 7-fold 


3.4. Accuracy 

Testing the ANN model using the k-fold cross validation method showed that each fold had a 
different accuracy. Table 3 shows the accuracy results of each fold for the ANN model in predicting the 
ripeness level of oil palm FFB. The accuracy calculation used positive and negative values represented by 1 
and 0. The prediction results also used two states which are immature and mature. The immature state has a 
negative value, and the mature state has a positive value. 

Table 3 shows the highest accuracies given by the 2™ fold, 5“ fold, and 7" fold with 90% accuracy. 
The lowest accuracy was obtained by the 4" fold, followed by the 6" and 3" fold, with accuracies below 
75%. Lower accuracy on the 3" fold, 4" fold, and 6" fold could be due to less k used. The k-fold 
cross-validation with more folds and a small number of replications should be used for performance 
evaluation [18]. 

Table 4 shows the evaluation of the ANN model used to predict the ripeness levels of oil palm FFB. 
Table 4 shows that the average accuracy for all seven folds is 79.17%. The result also displays that the 
highest accuracy belongs to the 2™! fold, 5 fold, and 7™ fold. The results imply that the ANN model has a 
prediction category as <fair classification>. It also shows that the best dataset used for training the ANN 
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model is the dataset of the 7 fold, which gives an accuracy of 90% and is categorized as <excellent 
classification>. This result is slightly less by 5 % using similar hyperspectral imaging and ANN experiment 
of oil palm FFBs [27]. It was possibly due to the push broom scheme of hyperspectral imaging used in this 
study, where slightly unmatched the conveyor speed and frame rate per second (fps), creating blur images. 


Table 3. Comparison of prediction results and testing data 


Test data Prediction result 

Fold Mature Immature Mature Immature Accuracy (%) 
Fold-1 5 5 3 7 80 
Fold-2 5 5 6 4 90 
Fold-3 6 5 5 6 72.72 
Fold-4 5 6 3 8 63.63 
Fold-5 5 5 4 6 90 
Fold-6 5 5 6 4 70 
Fold-7 5 5 4 6 90 


Table 4. Evaluation results of confusion matrix 
Evaluation Fold 1(%) _Fold2(%) Fold3(%) Fold4(%) Fold5(%) Fold6(%) Fold 7(%) __ Total (%) 


Accuracy 80 90 72.72 63.63 90 70 90 79.17 
Misclassification 20 10 27.27 36.36 10 30 10 20.83 
Recall 37.5 55.56 50 28.57 44.44 57.14 44.44 45.61 
Precision 100 83.33 80 66.67 100 66.67 100 83.87 


4. CONCLUSION 

This study aimed to investigate the potential use of an ANN and k-fold cross-validation to predict 
the ripeness levels of oil FFB. The prediction used hyperspectral images obtained using hyperspectral 
imaging. The ripeness levels were immature (unripe) and mature (ripe). SG filter was applied to smooth the 
hyperspectral data before being inserted into the neural network algorithm. The constructed ANN model used 
the k-fold cross-validation method to test its performance, which consisted of 7 folds. The evaluation 
performances of the testing used confusion matrix. The resulting confusion matrix parameters show that the 
average accuracy of the ANN model reaches 79.48%. The highest accuracies of 90 % belong to the 2™ fold, 5" 
fold, and 7" fold. The results showed that hyperspectral imaging with the SG filter, k-fold cross-validation, and 
ANN model is prospective to predict the ripeness levels of the oil palm FFB. These results will be the 
foundation toward using multispectral imaging in a rapid sorting and grading machine vision of oil palm FFBs. 
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